Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However, there is still a significant performance gap between NMT and SiMT. In this work, we propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD. Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e.g., +3.15 BLEU on En-Zh). Inspired by the behavior of human simultaneous interpreters, we propose a novel monolingual sampling strategy for SiMT, considering both chunk length and monotonicity. Experimental results show that our sampling strategy consistently outperforms the random sampling strategy (and other conventional typical NMT monolingual sampling strategies) by avoiding the key problem of SiMT -- hallucination, and has better scalability. We achieve +0.72 BLEU improvements on average against random sampling on En-Zh and En-Ja. Data and codes can be found at https://github.com/hexuandeng/Mono4SiMT.
translated by 谷歌翻译
成功的基于机器学习的命名实体识别模型可能会因某些特殊领域的文本而失败,例如中文地址和电子商务标题,需要足够的背景知识。对于人类注释者来说,此类文本也很难。实际上,我们可以从具有一些共同实体的相关文本中获得一些潜在的有用信息,以帮助文本理解。然后,人们可以通过引用相关样本来轻松地提出正确的答案。在本文中,我们建议使用相关样品增强NER模型。我们通过大规模内域未标记的数据从稀疏的BM25检索器中绘制相关样品。为了明确模拟人类推理过程,我们执行了通过多数投票校准的无培训实体类型。为了捕获训练阶段的相关特征,我们建议通过基于变压器的多构度跨编码器对相关样品进行建模。上述两个域数据集的经验结果显示了我们方法的功效。
translated by 谷歌翻译
基于深度学习的NLP模型被发现容易受到Word替代扰动的影响。在他们被广泛采用之前,需要解决坚固性的基本问题。沿着这条线,我们提出了一个正式的框架来评估词语级鲁棒性。首先,要研究模型的安全区域,我们引入了稳健的半径,这是模型可以抵抗任何扰动的边界。计算最大鲁棒性半径的计算变硬,我们估计其上限和下限。我们将攻击方法作为寻求上限和设计伪动态编程算法的攻击方法,用于更紧密的上限。然后验证方法用于下限。此外,为了评估在安全半径之外的区域的稳健性,我们从另一个视图中重新征服鲁棒性:量化。引入了具有严格统计保障的鲁棒度量,以测量对抗性示例的定量,这表明该模型对安全半径之外的扰动的敏感性。该度量有助于我们弄清楚为什么伯特这样的最先进的模型可以很容易地被几个单词替换所吸引,但在现实世界的噪音存在下概括很好。
translated by 谷歌翻译
到目前为止,命名实体识别(ner)已经参与了三种主要类型,包括平面,重叠(嵌套)和不连续的ner,主要是单独研究。最近,为统一的人员建立了一个日益增长的兴趣,并与一个单一模型同时解决上述三个工作。当前最佳性能的方法主要包括基于跨度和序列到序列的模型,不幸的是,前者仅关注边界识别,后者可能遭受暴露偏差。在这项工作中,我们通过将统一的ner建模为Word-Word关系分类来提出一种小说替代方案,即W ^ 2ner。通过有效地建模具有下面邻近字(NNW)和尾页字 - *(THW- *)关系的实体单词之间的邻近关系来解决统一网内的内核瓶颈。基于W ^ 2ner方案,我们开发了一个神经框架,其中统一的网格被建模为单词对的2D网格。然后,我们提出了多粒度的2D卷积,以便更好地精炼网格表示。最后,共同预测器用于足够原因的单词关系。我们对14个广泛使用的基准数据集进行了广泛的实验,用于平板,重叠和不连续的NER(8英语和6个中文数据集),我们的型号击败了所有当前的顶级表演基线,推动了最先进的表演统一的网。
translated by 谷歌翻译
统一的意见角色标签(ORL)旨在给予一篇文章检测一次拍摄中“意见持有人 - 目标”的所有可能的意见结构。不幸的是,现有的基于转换的统一方法受到更长的意见术语,并且无法解决术语重叠问题。通过采用基于跨度的图形模型实现了当前的最佳性能,然而仍然存在高模型复杂性并且在意见和角色之间的互动不足。在这项工作中,我们通过重新检测转换架构并使用指针网络(PINETNET)来调查新的解决方案。该框架在线性时间复杂度解析了所有意见结构,同时通过限制与PointNet的任何术语的限制。为了实现明确的观点 - 角色互动,我们进一步提出了一个统一的依赖性意见图(UDOG),共同建立了句法依赖结构和部分意见角色结构。然后,我们设计了居中性的图形聚合器(RCGA)以编码多关键udog,其中产生的高阶表示用于促进香草过渡系统中的预测。我们的模型在MPQA基准测试中实现了新的最先进结果。分析进一步证明了我们对疗效和效率的方法的优越性。
translated by 谷歌翻译
众包被视为有效监督学习的一个潜在解决方案,旨在通过人群工人建立大规模的注释培训数据。以前的研究重点是减少来自众包注释的噪音的影响。我们在这项工作中涉及不同的观点,关于所有众包作为个人注册人的金标。通过这种方式,我们发现众群可能与域适应高度相似,然后近域方法的最近进步几乎可以直接应用于众包。在这里,我们将命名实体识别(ner)作为一项研究案例,建议由尝试捕获有效域感知功能的域适配方法的吸引人感知表示学习模型。我们调查无监督和监督的众群学习,假设没有或只有小型专家注释。基准众包的实验结果表明,我们的方法非常有效,导致新的最先进的性能。此外,在监督环境下,我们只能通过非常小的专家注释来实现令人印象深刻的性能。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译